On 12/10/23 11:23, Scott Dorsey wrote:
> The lack of access to the latter from any source at all.
ACK
> I mean, I can
> cite the article (if I could find it), but there is no point in citing
> something that nobody can read.
I think there is some value in saying something and saying where it came
from. It offers a modicum of veracity and enables the reader to fact
check. What the reader does and does not have access to is not your
primary problem. Though it does behoove you to cite sources that you
know the reader has access to.
> Dejanews provided exactly what was asked for, and so did Google for a
> while, until they "improved" things to the point where it was completely
> broken.
ACK
> Committing the resources is an issue, but not the big one.
I think the necessary resources are going to be a bigger issue than many
other people.
INN's traditional spool directories are going to tax just about any file
system when you try to put hundreds of thousands of files in a single
directory. Some newsgroups will have significantly more articles than
others.
Sure, there are ways to store articles so that they aren't all in one
directory. But that's now a news server change. It can happen, but
it's not as simple as a configuration change. It will likely be a code
and a configuration change.
There are other message stores, but the ones that I'm aware of tend to
by cyclical in nature and of a fixed size which is antithetical to the
archive forever goal.
This is probably a case for a custom NNTP server that is really a
gateway (of sorts) to some sort of object store that is distributed and
designed to scale to millions of objects in a container (newsgroup).
Whatever is done needs to be flexible and have the ability to be
reconfigured as things grow. It should also have a little bit of
redundancy as the more systems that are added to it, the more fragile it
will become.
> The real problem is availability of the data.
I agree that's probably the /primary/ problem and supersedes the storage
in such as I believe that computer science / systems people can overcome
the problem mentioned above. -- I'm not as confident that we can
recover a full archive of Usenet any more. After all, according to
Wikipedia, we're talking about 43+ years of data. Much of that data was
deemed ephemeral by most people. Much of that data was difficult to
collect 20+ years ago. The intervening 20 years won't have helped the
matter.
I maintain that storage and accessibility to the data is the /secondary/
largest problem.
> When dejanews was created an enormous amount of
> effort went into finding pre-dejanews material and passing it off to dejanews.
> At this point, most of that likely no longer exists except in google's
> database. And there's no point in it being in there if people can't find it.
Eh ... given Google's propensity to not get rid of things, I sort of
suspect that the old DeJa News archive is probably still exists
somewhere. It may actually be directly behind Google Groups Usenet
gateway and the gateway itself may be munging articles as they are
presented. Google really is antithetical to destroying data. Getting
to that data, that's an entirely different issue.